Single-nucleus RNA-sequencing of autosomal dominant Alzheimer disease and risk variant carriers

Genetic studies of Alzheimer disease (AD) have prioritized variants in genes related to the amyloid cascade, lipid metabolism, and neuroimmune modulation. However, the cell-specific effect of variants in these genes is not fully understood. Here, we perform single-nucleus RNA-sequencing (snRNA-seq) on nearly 300,000 nuclei from the parietal cortex of AD autosomal dominant (APP and PSEN1) and risk-modifying variant (APOE, TREM2 and MS4A) carriers. Within individual cell types, we capture genes commonly dysregulated across variant groups. However, specific transcriptional states are more prevalent within variant carriers. TREM2 oligodendrocytes show a dysregulated autophagy-lysosomal pathway, MS4A microglia have dysregulated complement cascade genes, and APOEε4 inhibitory neurons display signs of ferroptosis. All cell types have enriched states in autosomal dominant carriers. We leverage differential expression and single-nucleus ATAC-seq to map GWAS signals to effector cell types including the NCK2 signal to neurons in addition to the initially proposed microglia. Overall, our results provide insights into the transcriptional diversity resulting from AD genetic architecture and cellular heterogeneity. The data can be explored on the online browser (http://web.hararilab.org/SNARE/).

in triggering receptors expressed on myeloid cells 2 (TREM2), a gene that drives microglia activation, increase AD risk 10 . Recent GWAS have identified additional common genetic risk factors for AD, including rs1582763, an intergenic variant in the MS4A locus that confers resilience to AD and is associated with higher cerebrospinal fluid (CSF) soluble TREM2 levels [11][12][13][14] . However, linkage and association studies do not reveal downstream transcriptional ramifications nor the cell types these variants influence.
Single-nucleus RNA-sequencing (snRNA-seq) has emerged as a powerful approach to interrogating the underlying transcriptional landscape of the cellularly-complex human brain. SnRNA-seq has been used to study AD in different cohorts and brain regions, including the entorhinal cortex 15 , temporal neocortex 15,16 , prefrontal cortex 17,18 , and the dorsolateral prefrontal cortex (DLPFC) 16,19,20 . However, no singlecell study has analyzed genetic high-risk variants and mutations in the AD parietal cortex. A systematic and unbiased survey of cell typespecific gene expression across this region will help identify transcriptional states associated with the high Aβ plaque and tangle burden but relatively low atrophy in the early stages of AD, which are characteristic of the parietal cortex 21 .
In this study, we selected participants from the Dominantly Inherited Alzheimer Network (DIAN) and Charles F. And Joanne Knight Alzheimer Disease Research Center (Knight ADRC) biobank to enrich our cohort with genetically defined AD patients carrying pathogenic, risk, and resilience genetic variants to leverage the naturally occurring perturbations of these genes and elucidate their role in AD pathogenesis. Specifically, we performed snRNA-seq on the parietal cortex (Brodmann areas 7 and 39). After stringent filtering, we obtained 294,114 high-quality nuclei from 67 individuals. We iteratively extracted nuclei by cell type (digital sorting), subclustered the nuclei into cell type transcriptional states (cell states), and identified differentially expressed genes (DEGs) by cell states. We identified differences in celltype composition associated with ADAD, APOEε4, TREM2, and MS4A carriers and identified DEGs between the genetic groups and control samples. Finally, we created an online browser (http://web.hararilab. org/SNARE/) for convenient, unrestricted access to the snRNA-seq expression data.
We independently replicated the genetically driven transcriptional and proportional profiles identified in the parietal cortex using snRNA-seq data of the DLPFC from the ROSMAP cohort 20 , snRNA-/ snATAC-seq data of the prefrontal cortex from UCI MIND's ADRC 17 , and single-cell RNAseq of 5xFAD mouse microglia 22 (see details in Methods).
Our findings highlight the power of leveraging genetic and singlecell molecular data to understand the heterogeneity of pathways, biological processes, and cell types mediating AD genetic risk factors.

Results
A single-nucleus atlas captures the transcriptional diversity among sporadic and autosomal dominant AD Detailed clinical data, postmortem neuropathological data, and genetic characterization of the discovery cohort are described in Table 1. We generated whole transcriptomes of the parietal cortices at single-cell resolution using 10x Genomics Next GEM technology. After data cleaning and quality control (QC; see Methods), we retained 67 samples and obtained molecular data for 16 carriers of pathogenic mutations in APP and PSEN1 (autosomal dominant AD; ADAD), 31 sAD non-carriers of risk-modifying variants, three individuals who matched AD-neuropathological criteria but without clinical cognitive impairment at age-of-death (presymptomatic), eight individuals who matched non-AD neurodegenerative pathologies criteria (other), and nine individuals who exhibited neither neurodegenerative pathology nor evidence of dementia (control) ( Table 1). Within the cohort, 41 samples carried the minor allele (A) for rs1582763, an SNP in the MS4A gene cluster 11 Fig. 1a, and Supplementary Dataset 1).
After subclustering each cell type, we identified five to nine subclusters or cell-type transcription states (cell states) within each cell type (Supplementary Dataset 2). These cell states were generally well represented among samples and batches (Supplementary Fig. 1 and Supplementary Dataset 5). Cell states flagged for review (<half the total entropy possible) were later identified as either having few nuclei or being enriched for carriers of genetic factors (i.e., ADAD, MS4A, or TREM2). Neuronal cell states were then categorized as excitatory (EN) or inhibitory (IN; See Methods). We capture an average of 2696 upregulated genes (out of an average of 16,223 genes tested) that passed multiple testing correction in each cell state compared to all other cell states in the same cell type (Supplementary Datasets 7,8,9), which reveals a rich transcriptional diversity within all brain cell types. As expected, due to statistical power, there was a positive correlation between the number of detectable DEGs and the number of nuclei

Shared transcriptomic profiles across AD groups and cell types
We leveraged the snRNA-seq data to compare the cell-type-specific gene expression patterns of the sAD, TREM2, and ADAD groups compared to controls (Supplementary Dataset 9). ADAD donors have more underexpressed than overexpressed genes for each cell type.
Conversely, TREM2 samples have more overexpressed than underexpressed genes (Supplementary Datasets 10, 11). Astrocytes, excitatory neurons, and OPCs show a trend for transcriptional underexpression across groups; in contrast, microglia and oligodendrocytes show overexpression (Fig. 2a). This suggests that, in general, astrocytes, excitatory neurons, and OPCs lose functionality overall while microglia and oligodendrocytes increase in functionality in AD. A notable exception is the TREM2 group's OPCs, which generally show overexpression compared to controls. We then compared whether genes were coincidentally differentially expressed across AD groups ( Fig. 2b and Supplementary Fig. 3). In many instances, sAD and ADAD have the same direction of effect on  differential gene expression. However, the effect tends to be stronger in ADAD (Supplementary Fig. 4). We also identified group-specific gene patterns of interest. Within ADAD astrocytes, there was overexpression of genes SLC7A5, LRP2, and SLC7A1 related to "transport across blood-brain barrier" (Benjamini-Hochberg p = 4.09 × 10 −2 ). Also within ADAD astrocytes, was the overexpression of ITGA10, SER-PINF2, P3H2, PLOD3, PLOD1, and ADAMTS12 related to "extracellular matrix organization" (Benjamini-Hochberg p = 1.35 × 10 −2 ) which is concordant with other evidence implicating astrocytes in matrisome perturbation 26 . Across AD groups in excitatory neurons, we identified overexpression of EHD1, DGAT2, LRP5, and LDLRAP1 relating to 'cholesterol homeostasis' (Benjamini-Hochberg p = 3.99 × 10 −2 ). Unique to TREM2 OPCs, we captured overexpression of genes PDGFRB, PFKP, and PDK1 relating to "Central carbon metabolism" (Benjamini-Hochberg p = 3.94 × 10 −2 , Supplementary Dataset 12).
In astrocytes, SULT1A2 and SQSTM1 were overexpressed in sAD, TREM2, and ADAD groups compared to controls (Supplementary Fig. 5 and Supplementary Datasets 15,16). SULT1A2 catalyzes the sulfate conjugation of hormones, neurotransmitters, drugs, and xenobiotics. The SQSTM1 gene encodes p62, a protein involved in the signaling for multiple pathways relating to proteasomes, autophagy, oxidative stress, inflammation, and immune response 27 . It has been proposed as , and sAD by cell type. a Ridge plots showing the distribution of gene estimates extracted from the linear mixed models comparing control nuclei to ADAD, TREM2, and sAD nuclei within cell types. Only DEGs that passed multiple testing correction in at least one genetic group were considered. For inclusion, the gene must also have been nominally significant (p < 0.05) in the genetic group. The total number of genes is shown on the right tail. b Heatmaps of gene estimates from the same models emphasize the divergent and congruent expression patterns across genetic groups. The largest 500 estimates were selected per cell type and used to create full heatmaps (found in Supplement). "Modules" were manually created based on expression patterns and dendrogram groupings. The top 10% of genes from each module were extracted (order preserved) to produce the above heatmaps. "sAD sig.", "TREM2 sig.", and "ADAD sig." depict the significance status of each gene for that group. "BH": Benjamini-Hochberg p < 0.05, "Nominal": p < 0.05, "NS": not significant. Source data are provided as a Source Data file. a therapeutic target to facilitate amyloid-β removal by autophagy activation 28 .
In microglia, the genes FLT1 and PTPRG were overexpressed in sAD, TREM2, and ADAD, whereas FARP1 was overexpressed in sAD, ADAD, and APOEε4+ (see Methods) samples. FLT1 mediates microglial chemotactic inflammatory responses, which contribute to pathological conditions in the AD brain 29 . PTPRG is a tyrosine phosphatase, and a family-based GWAS hit for AD risk 30 . FARP1 is involved with synapse formation, and the retention of a specific intron within this gene is associated with AD in mouse brains 31 .
The gene SNTG, which is associated with the age of onset in PSEN1 p.E280A carriers 32 , was dysregulated in excitatory neurons of ADAD and APOEε4+ samples. In oligodendrocytes, LPL and VWA3B are both overexpressed in TREM2 and ADAD compared to controls and ADAD compared to sAD. LPL, lipoprotein lipase, is increased in AD brains and associated with AD progression 33 . Mutations in VWA3B cause a recessive form of Spinocerebellar Ataxia 34 .
In OPCs, CNTNAP2 is overexpressed in the ADAD and APOEε4+ groups and underexpressed in the TREM2 group. PTPN13 is overexpressed in the TREM2 and ADAD groups compared to controls and in ADAD compared to the sAD group. CNTNAP2 is required for jap junction formation. It is an AD GWAS hit linked to neurodevelopmental disorders, including Tourette syndrome, autism, schizophrenia, epilepsy, OCD, and ADHD 35 . PTPN13 is a tyrosine phosphatase involved in tau tyrosine phosphorylation 36 . These genes highlight core biological functions dysregulated across multiple genetic backgrounds.

MS4A resilience variant carriers show a specific inflammatory microglial activation state
Carriers of rs1582763-A, an intergenic allele associated with reduced risk for AD and higher CSF sTREM2 levels 14   a, e Proportion plots show a significant (p < 0.05) enrichment (*) of cell states in TREM2 reduced activation carriers (TREM2 R ) compared to all other samples as determined by linear regression. Exact p values can be found in Supplementary Dataset 6. The proportion was calculated for each sample (see Methods). For visualization, sample proportions are averaged by group. "Other" represents all sAD non-TREM2 reduced activation carriers, including carriers of other TREM2 variants. b Barplot shows the expression of both resting and activated microglia marker genes in Mic-reduced (mic.2) compared to the Mic-resting (mic.0) and Micactivated cell states (mic.1). Expression was corrected for the age of death and sex using partial residuals. c UMAP plots showing the integrated nuclei from the discovery cohort and ROSMAP, split by cohort. d Violin plot of cell state expression signatures in the ROSMAP nuclei. The signature was calculated from the upregulated genes from the discovery cohort. Differences in signature scores were calculated using linear regression. (*) = p < 0.05, (**) = p < 5.0 × 10 −3 , (***) = p < 5.0 × 10 −20 ; exact p values can be found in the Source Data file. f Barplot shows the log2 fold-change of TFEB by oligodendrocyte cell state. g Identification of gene regulatory networks (GRN) in Oligo-TFEB discovery and replication cohorts. Regulons were filtered to include only those identified in both cohorts (p = 9.98 × 10 −41 ; hypergeometric analysis) with significant differential expression in Oligo-TFEB. Then those regulons with significant (Benjamini-Hochberg p < 0.05; hypergeometric analysis and Benjamini-Hochberg multiple testing correction) coincidence in the underlying target genes between cohorts were selected. h Gene regulatory network for transcription factors (TF; shown in blue) replicated in both discovery (purple edges) and ROSMAP (orange edges) datasets for oligo-TFEB. Replicated target genes and edges are shown in yellow and green respectively. Genes within AD GWAS loci are highlighted in red. g, h (*) = Benjamini-Hochberg p < 0.05, (**) = Benjamini-Hochberg p < 0.01. Source data are provided as a Source Data file. . These genes encode for related receptors in the TGF-β superfamily, which is implicated in multiple neurological disorders 60 . Mic-proinflammatory also showed increased expression of TMEM163, an AD GWAS gene 61 . TMEM163 is involved in transporting zinc into cells where the zinc influences reactive oxygen species (ROS) levels 62 . Evidence suggests ROS causes genomic damage to neurons leading to cell death in AD 63 . A gene ontology analysis also showed an upregulation of genes related to "cytokine response/production" (Benjamini-Hochberg p = 2.89 × 10 −9 ) and "regulation of ERK1 and ERK2 cascade" (Benjamini-Hochberg p = 2.37 × 10 −7 ; Supplementary Dataset 20b). In addition, carriers of rs1582763-A had decreased proportions of Astro-resting (Astro.0) and a trend towards increased proportions of Astro-activated (Astro.1) (Fig. 5d, Supplementary Fig. 7f, and Supplementary Dataset 6e). The MS4A genes are not expressed in astrocytes, suggesting cellular crosstalk synchronizes microglia and astrocytes' activation.

Genes within risk AD GWAS loci show variable expression between transcriptional states
GWAS meta-analyses have successfully identified genomic loci associated with sAD [11][12][13] . We sought to leverage the glial and neuronal cell states to help map genomic loci implicated in the general AD population onto genes and cell types. This is based on the rationale that genes differentially expressed between cell states might be functionally relevant to that cell type regardless of the relative expression of the gene compared to other cell types. We curated a list of genes in loci identified in recent AD GWAS, starting with 89 genes previously prioritized 13,61,69 (77 measured in this study) from 38 different loci (Supplementary Dataset 22a); we extended the list by adding the "nonprioritized" genes within these AD GWAS associated loci also present in this dataset (530 genes; Supplementary Dataset 22b, c).
Out of the 77 prioritized genes, 68 (from 37 loci) showed differential expression between the cell states of at least one cell type ( Fig. 6a and Supplementary Fig. 13). We also found that for 13 genes, the cell type with the greatest expression level variability (log foldchange between cell states) did not match the cell type with the highest average expression. For example, CR1 showed the highest average expression in oligodendrocytes, but microglia and OPCs had larger log 2 fold changes in expression (log 2 FC) between cell states. This suggests that transcriptional states can provide an additional layer of information to help map GWAS genes to cell types. Rare coding variants in PLCG2, implicated in TREM2-dependent microglial function in AD 70 , have been recently reported to protect against AD 71 . We found differential expression of PLCG2 between cell states within all cell types, suggesting an extended functionality beyond the TREM2 signaling pathway in microglia (Fig. 6) 72,73 . These PLCG2 expression differences were largely driven by the significant overexpression detected in ADAD brains across all cell types except microglia, which was nominally significant (Supplementary Dataset 10).
We accessed snRNA-/snATAC-seq data from 20 additional brains (prefrontal cortex) from the UCI MIND's ADRC 17 . Using the snRNA-seq data, we replicated the expression and fold-change patterns identified in the discovery cohort (see Methods). Specifically, we observed a high correlation between cell-type-specific gene expression (Pearson R = 0.93; p = 5.65 × 10 −211 ; Fig. 6b) and an overall concordance in the DEG by cell-type calling (Fisher exact test OR = 7.88, p = 4.09 × 10 −24 ; Supplementary Fig. 14 and Supplementary Dataset 23) for the prioritized genes between cohorts. All cell types, except OPCs, individually had a significant concordance in DEG calling.
Next, we sought to determine if genetic variants identified in AD GWAS showed chromatin co-accessibility with the prioritized genes, thus providing an independent layer of evidence relating genetic variants to genes and cell types. We employed the snATAC-seq from UCI 17 Fig. 6 | Transcriptional states complement chromatin accessibility to help prioritize causal cell types and genes mediating AD GWAS risk variants. a Overview of prioritized risk genes by AD GWAS studies and their transcriptional changes across cellular states (absolute log 2 fold-change in any cellular state for cell type). The "Locus" row indicates genes within the same locus using alternating black and gray rectangles. The color of the squares represents the max log 2 foldchange of the gene between cell states (subclusters) of that cell type (gray: not significant). The square size represents the average log10 transformed gene expression. Borders represent co-accessibility (red background) or overlap (black outline) between the TSS and a regulatory element containing a prioritized (95% credible set) AD variant. Background color intensity corresponds to the highest posterior probability of association (PPA) of the 95% credible set variants overlapping the TSS or co-accessible element. b Replication of the parietal lobe differential expression results using the UCI prefrontal cortex snRNA-seq data. OR: Fisher exact test odds-ratio (replicated vs. non-replicated). c Distribution of genes co-accessible or with their TSS overlapping a regulatory element (snATAC-seq narrow peak) containing a fine-mapped AD risk genetic variant. The color indicates the number of cell types the co-accessibility signal was detected in. d Chromatin accessibility signals across cell types for the BIN1 locus. The lead variant is represented by a red vertical bar, and the fine-mapping PPAs are plotted for each variant with PPA >0.01. TSS regions co-accessible with variant-overlapping regulatory elements are plotted as arcs below each signal track. e Same visualization as (d) for the NCK2 locus. Source data are provided as a Source Data file. and fine-mapped AD GWAS credible sets 61 to analyze cell-type-specific chromatin accessibility between the transcription start sites (TSS) of the prioritized genes and the variants in the AD GWAS credible sets. We determined that microglia cells had the highest fraction of genes coaccessible with AD risk variants, as was previously reported 17 . A large fraction (34.5%) of these genes only had co-accessibility signals within microglia, indicating that they likely mediate AD genetic risk in a celltype-specific manner (Fig. 6c). For example, rs6733839 in the BIN1 locus is co-accessible with the BIN1 TSS only in microglia (Fig. 6d). This is concordant with our predictions using transcriptional data (Fig. 6a and Supplementary Dataset 8), which showed that BIN1 is underexpressed in Mic-activated compared to both Mic-resting and Micproinflammatory (MS4A associated), suggesting that the role of BIN1 in AD risk is dependent on microglia activation.
A recent study suggested that microglial cells mediate AD genetic risk at the NCK2 locus (lead variant rs143080277, MAF = 0.0054), principally based on the higher expression of NCK2 in microglia 74 . We also observed that NCK2 expression is highest in microglia ( Fig. 6a and Supplementary Fig. 12); however, we identified the largest NCK2 expression differences within excitatory neuron cell states (max. linear mixed effects regression fold-change = 2.02). Analyses of chromatin accessibility show that rs143080277 is co-accessible with the NCK2 promoter in excitatory and inhibitory neurons, but not microglia (Fig. 6e). Altogether, these results highlight that DE between cell-typespecific transcriptional states provides additional information to help prioritize target genes and cell types mediating genetic risk at noncoding loci, which is complementary to other evidence, including celltype-specific chromatin accessibility.

Discussion
We performed snRNA-seq on nuclei from the understudied parietal cortex from a cohort enriched in carriers of AD genetic variants in APP, PSEN1, TREM2, APOE, and the MS4A cluster (rs1582763-A, an intergenic resilience variant in the MS4A locus associated with increased levels of soluble TREM2 in CSF and reduced AD risk 14 ).
In this cross-sectional study, we identify genes commonly dysregulated across AD groups, including FARP1, SQSTM1, CNTNAP2, LPL, and SNTG in microglia, astrocytes, OPCs, oligodendrocytes and excitatory neurons respectively. These represent core genes and pathways perturbed across a wide range of genetic backgrounds and are, therefore, powerful potential targets for therapeutic intervention. We also noted that the parietal lobes of sporadic and autosomal dominant AD brains show similar transcriptional dysregulation in general, but the effect is usually stronger in ADAD. This supports the premise of the amyloid hypothesis, that the molecular changes in ADAD are also found in sAD, and shows that the ADAD cohorts provide unique opportunities to study the underlying biology of the disease.
It is worth noting that we observed cell states enriched within ADAD samples for all cell types, with the Mic-stress state almost exclusively present in these carriers. We cannot rule out the possibility that this specificity could be due to younger ages in ADAD (an average of 32.9 years younger) compared to sAD. However, upon integrating microglia and oligodendrocyte snRNA-seq data from the parietal cortex with that from the DLPFC, we observed that this transcriptional state is also present in sporadic late-onset AD (Fig. 4c, g). In sAD, the parietal cortex is affected in the later stages of disease progression, whereas the DLPFC is affected earlier 75,76 . Therefore, the cell states identified in the parietal cortex of ADAD participants could represent accelerated stages of pathology like the DLPFC in late-onset sAD, consistent with the elevated tau PET found in the parietal region of ADAD compared to sAD patients 77 . Additional studies capturing multiple brain regions with varying degrees of pathology will provide the data to understand these observations in more detail.
TREM2 p.R47H, p.R62H, and p.H157Y show reduced cell activation in vitro 20,45 . Here, we report microglia and oligodendrocyte cell states associated with TREM2 risk variant carriers. TREM2 is mainly expressed in microglia. However, loss of function mutations in TREM2 cause Nasu-Hakola disease, which is characterized by white matter changes, including loss of myelination, suggesting oligodendrocyte-microglia crosstalk 78 . Therefore, it is likely that AD patients with TREM2 variants had altered microglia behavior which changed the microenvironment, driving the oligodendrocyte cell state found in TREM2 variant carriers. The microglial cluster associated with these reduced activation variant carriers displayed a resting-like state, suggesting they might benefit from treatments targeting TREM2 to prevent its cleavage, increasing microglial activation 79 ; however, not all TREM2 risk variants are functionally equivalent 14,80,81 . We also observed a microglia state expressing activation markers enriched for homozygous carriers of the MS4A resilient variant. In contrast to the major activated microglia state, this cell state showed upregulation of proinflammatory genes and cytokine signaling. A better understanding of this microglia resilience state could improve efforts to induce therapeutic microglial activation.
APOEε4 carriers had decreased proportions of nuclei in Mic-PNN and IN-axonogenesis, expression states that both influence axon regeneration and brain plasticity. The disruption of these cell states could be a contributor to the faster cognitive decline associated with the ε4 allele. Pathway analysis implicates ferroptosis in the loss of INaxonogenesis cells suggesting this pathway warrants further investigation in APOEε4 carriers.
Most AD GWAS hits are non-coding variants that potentially influence gene expression. By inspecting the cell state differential expression results, we found that multiple cell types might mediate some genetic loci's effect. For instance, both differential expression and co-accessibility analyses linked the NCK2 GWAS hit to neurons in addition to the initially proposed microglia. This example highlights the benefit of using cell state DE as an additional layer in linking GWAS signals to genes and cell types.
These analyses are limited by the rarity of mutations in APP, PSEN1, and the low frequency of variants in TREM2 in the general population. As a result, samples with mutations in APP and PSEN1 were merged and considered as a single ADAD group and three TREM2 variants (p.R47H, p.R62H, and p.H157Y) were merged as the TREM2reduced activation group despite there being slight differences in the functional mechanisms. Further analyses on additional tissue samples from carriers of these variants are needed to fully uncover the variantspecific effects in these critical AD genes.
Disease heterogeneity could explain the failure of many promising clinical trials to meet their endpoints 82 . The mechanisms driving AD heterogeneity are complex and rarely studied. Here, we show that genetic variants influence cell expression states, and, therefore, could explain some disease heterogeneity. The molecular characterization of the genes and pathways driving these cell states elucidates the functional mechanisms driving disease heterogeneity and possible targets for therapeutic intervention in the era of personalized medicine.
In conclusion, our findings support that single AD risk variants can influence the transcriptional landscapes of multiple brain cell types. Pathogenic mutations in APP and PSEN1 altered the profiles of neurons, but more especially glia when compared to controls and sAD. TREM2 risk variants shifted microglial and oligodendrocytic profiles and the MS4A resilience variant inflated a proinflammatory microglia profile. Each of these changes can modify AD's pathological progression and clinical manifestations.

Processing of brain tissue samples
The Washington University Translational Human Neurodegenerative Research (THuNDR) laboratory, which serves as the Neuropathology Core for the Knight Alzheimer Disease Research Center (Knight ADRC) and the Dominantly Inherited Alzheimer Network (DIAN) study, provided the postmortem parietal lobe tissue samples. These samples were obtained with informed consent for research use and approved by the Washington University School of Medicine in St. Louis Institutional Review Board. According to the National Institute on Aging-Alzheimer's Association (NIA-AA) criteria, AD neuropathological changes were assessed. Their demographic, clinical severity, and neuropathological information are presented in Table 1. RIN scores were used to evaluate bulk RNA quality for each sample before snRNAseq library preparation 83 (Supplementary Dataset 24).

Nuclei isolation and snRNA-seq on the 10X Genomics platform
The frozen human parietal cortex samples were processed according to the "Nuclei extraction and library preparation" protocol described by ref. 23. Briefly, the tissue was homogenized, and the nuclei were isolated using a density gradient. The nuclei were then sequenced using the 10X Chromium single cell Reagent Kit v3, targeting 10,000 cells per sample and 50,000 reads per cell for each sample.
SnRNA-seq data processing with 10X genomics CellRanger software and data filtering The CellRanger (v3.0.2 10X Genomics) software was employed to align the sequences and quantify gene expression. We used the GRCH38 (release-93) reference to prepare a pre-mRNA reference according to the steps detailed by 10X Genomics (References-3.0.0). The software was packaged into a Docker container (https://hub.docker.com/r/ ngicenter/cellranger3.0.2), allowing us to launch it within the McDonnell Genome Institute (MGI) infrastructure, reducing the computing time for generating the BAM files. Four samples failed in library prep and sequencing, leaving 70 samples to pass to QC.
Filtering and QC were done using the Seurat package (v3.1.2) on each subject individually. Each raw gene expression matrix for each sample was plotted using BarcodeInflectionsPlot to calculate the inflection points derived from the barcode-rank distribution. Thresholds were selected to isolate uniform regions of the distribution (Barcode-rank Distribution, Table 2, Supplementary Fig. 15, and Supplementary Dataset 25). Once the thresholds were determined, a subset of the data were isolated. We removed nuclei with high mitochondria gene expression following the dynamic model proposed by ref. 19. Briefly, the nuclei were grouped by their percentage of mitochondria values using k = 2 clustering, and the group with the higher percentage values was removed. Genes not expressed in at least ten nuclei were removed from the final matrix. To detect and discard doublets, we used DoubletFinder 84 (v1.0.0), which removes nuclei with expression profiles that resemble synthetically mixed nuclei from the dataset. The gene expression matrices from all samples were combined in R independently for further processing using the Seurat protocol. One sample was removed during this process due to low nuclei counts, leaving 69 samples.

Dimensionality reduction, clustering, and UMAP
The merged expression matrix was normalized using the SCTransform protocol by Seurat. This function calculates a model of technical noise in scRNA-seq data using "regularized negative binomial regression" as described previously in ref. 85. We regressed out, during the normalization, the number of genes, the number of UMIs, and the percentage of mitochondria. The principal components were calculated using the first 3000 variable genes, and the Uniform Manifold Approximation and Projection (UMAP) analysis was performed with the top 14 PCs. The clustering was done using a resolution of 0.2.

Cluster annotation and quantification of regional and individual contributions to cell types
We employed a list of marker genes we had previously curated 23 to annotate brain snRNAseq data. We used the DotPlot function (Seurat package) to visualize the average expression of genes related to specific cell types. This approach enabled the labeling of cell types based on the overall expression profile of the nuclei, regardless of dropout events. In addition, we employ a supervised method termed Garnett 86 (v0.1.14) that leverages machine learning to classify each nucleus and estimate cluster homogeneity. This method also provides a metric of gene ambiguity, which enables further optimization of the marker genes to be included in the classification process. For this method, we employed SYT1, SNAP25, and GRIN1 to classify neurons, NRGN, SLC17A7, and CAMK2A for excitatory neurons and GAD1 and GAD2 for inhibitory neurons; AQP4 and GFAP for astrocytes; CSF1R, CD74, and C3 for microglia; MOBP, PLP1, and MBP for oligodendrocytes; PDGFRA, CSPG4, and VCAN for oligodendrocyte precursor cells (OPCs); CLDN5, TM4SF1, and CDH5 for endothelial cells and ANPEP for pericytes. We employed the function check_markers (Garnett package) to evaluate the ambiguity score and the relative number of cells for each cell type. A classifier was then trained using the marker file, with "num_unknown" set to 50. This classifier annotates cells with cell-type assignments extended to nearby cells using the "clustering-extended type" labeling option. At this stage, one ambiguous cluster and one subject-specific cluster were dropped. One sample was predominantly in the ambiguous cluster, and another was predominantly in the subject-specific cluster, so those samples were removed entirely, leaving 67 samples. Distributions of UMI counts, gene counts, and percentage of mitochondrial reads for each sample are shown in Supplementary Fig. 16.

Identification of alternative cell-type transcription states
The nuclei within the primary cell-type clusters were each isolated from the full dataset and re-clustered. We re-normalized the data subset using the same protocol as explained in section 4 in methods. The number of PCs used for UMAP dimensionality reduction was different for each cell type, 4, 8, 10, 6, and 5, for neurons, oligodendrocytes, microglia, astrocytes, and OPCs, respectively. We then employed Seurat's FindNeighbors and FindClusters functions to identify unique cell states or subclusters (resolution = 0.1, 0.2, 0.2, 0.05, and 0.15). Additionally, we used the Garnett protocol to examine nuclei in each expression state within each cell type to detect and remove those nuclei that did not resemble a trustworthy expression profile from downstream analyses. After this final stage of QC, we ended with 67 of the 74 brains and 294,114 of the 1,102,459 nuclei. Distributions of UMI counts, gene counts, and percentage of mitochondrial reads by cell state are shown in Supplementary Fig. 17.

Sample and batch entropy by clusters and cell state
To evaluate the sample and batch effects in clustering, the sample and batch entropies were calculated for clusters and cell types for the full snRNA-seq dataset and the cell states for each cell type 87 . Shannon entropy was used to calculate individual cluster entropies (Eq. 1), and the weighted sum was used to calculate overall clustering entropy (Eq. 2) 88 . The entropies were normalized by dividing each entropy value by the maximum entropy possible for each scenario (batch (n = 14) = 3.81; sample (n = 67) = 6.07) (Supplementary Dataset 5).
Excitatory and inhibitory neuron classification Neuronal marker genes SLC17A7 and GAD1 from ref. 25. were used to classify the individual neuron cell states as either excitatory (EN) or inhibitory (IN), respectively ( Supplementary Fig. 18). Expression of these genes was log2 transformed, averaged by sample within each cluster, and then averaged by cluster to get a final score for each cluster. Downstream analyses were performed within these new classifications. Gene markers for the cortical layer were also from Lake et al. to link each neuronal cell state to a probable cortical layer ( Supplementary Fig. 18).

Differential proportion analysis
We employed linear regression models testing each individual's cell state compositions to identify associations between cell-type transcriptional states and disease groups (ADAD, sAD, TREM2, TREM2_reduced, rs158276). More explicitly, the number of nuclei a subject had in a specific cell state was divided by the subject's total nuclei count for that cell type creating a proportion (Eq. 3). The proportions were normalized using a cube root transformation and were corrected by sex, age of death, and disease status depending on the variable of interest. We removed participants who contributed fewer than 60 nuclei to the cell type cluster. The TREM2 and APOE analyses only included the 31 sAD samples. We utilized glm, a standard function in R, to implement the model (Eqs. 4-8).
rs1582763ðAdditiveÞ : ðProportionÞ APOEε4 + : ADAD : sAD : To visualize these results, cell state proportions were averaged between the samples in a group and displayed in a stacked barplot using the ggplot2 (v3.3.6) library in R. The location and density of nuclei for these groups in the UMAP space were also visualized using a modified version of SCANPY's embedding_density functions that extended its functionality to quantitative variables (rs1582763 allele counts). This modified code can be found at https://github.com/ HarariLab/parietal-snRNAseq 89 .

Cell state differential expression
To determine if there was unique functionality or potentially altered cell states due to disease, we fitted a linear mixed model that predicted the expression level of each gene for the individual nuclei by cell state and corrected for the subject of origin and sex (Eq. 9) 90 . Control, sAD, and ADAD samples were used to calculate cell-state differentially expressed genes. Expression levels were extracted from the Seurat objects using GetAssayData with the 'slot' parameter set to "counts". Age of death (AOD) was not included in the model because AOD is correlated with ADAD status. The R package nebula 91 (v1.1.5) was used to implement the model, including parameters for a zero-inflated negative binomial distribution (model = "NBLMM", method = "LN") and the random effect of the subject of origin 91 . The number of UMI's per nuclei was already accounted for during SCTransformation, so the model did not need to account for the number of UMI's.
Expression ∼ CellState + Sex + ð1|SubjectÞ; Within each cell type, differential expression (DE) was calculated on each cell state versus all other states and each state against each other state individually. Participants with sufficient nuclei counts (astrocytes, microglia: n > 50; excitatory neurons, inhibitory neurons, oligodendrocytes: n > 60; OPCs: n > 66) in the cell type cluster were included in the analysis. Thresholds were determined by natural breaks in the count distributions for each cell type.

Genetic factor differential expression
Differentially expressed genes were identified within each cell type by genetic status, namely ADAD, TREM2, or sAD vs. Controls and APOEε4− vs. APOEε4+. The nuclei were isolated for each group (ADAD, TREM2 variant carriers, sAD, and APOEε4+). Each group was compared to controls (neuropath. free controls or APOEε4−) using linear mixed models as explained above. Samples with sufficient nuclei counts (astrocytes, microglia: n > 50; excitatory neurons, inhibitory neurons, oligodendrocytes: n > 60; OPCs: n > 66) in the cell type cluster were included in the analysis. The following model was used: Expression ∼ GeneticStatus + AOD + Sex + ð1|SubjectÞ; Age of death (AOD) was not included when analyzing ADAD. This model was run on the entire cell type and each cell state within the cell types. Only non-TREM2 sAD samples were included in the APOEε4+ analysis. Only carriers of TREM2 variants p.R47H, p.R62H, and p.H157Y were included in the TREM2 compared to controls analyses.

Handling of genetically related individuals
The data generated is primarily from unrelated donors, but also includes data from related samples from three nuclear families. In more detail, two pairs from two sibships from the Knight ADRC, and one pair of related donors from the DIAN cohort (sAD-family, sADpresymptomatic family, and ADAD-family; Supplementary Dataset 26). To confirm that this would not bias the differential expression results, we ran analyses with both samples in a family retained and one sample from the family removed. Using the largest (oligodendrocytes) and one of the smallest (microglia) cell types, we tested ADAD compared to controls (ADAD-family, sample40 dropped) and sAD compared to controls (sAD-family, sample59 dropped) and calculated the correlation of the estimates and -log10-transformed p values between the "retained" and "removed" analyses using cor.test in R (Supplementary Fig. 19).

Overlapping genes
Overlapping DEGs from the genetic factor differential expression analyses were identified using two approaches. First, the results were grouped by genetic factor, and the cell types were overlapped (Supplementary Fig. 5 and Supplementary Dataset 13). Second, the results were grouped by cell type, and the genetic factors were overlapped (Supplementary Fig. 5 and Supplementary Dataset 15). Overlapping gene sets that included three or more overlaps were run through gene enrichment analysis using enrichR (v2.1; hosted by the Ma'ayan Laboratory 92,93 ) (Supplementary Datasets 14, 16). Overlaps were visualized using the ComplexUpset (v1.3.3) library in R.

Strength and direction of effects
To see which genes are simply more strongly differentially expressed in ADAD samples versus those unique to ADAD samples, the multipletest-corrected significant DEGs for sAD, TREM2, and ADAD samples compared to controls were isolated. The three gene lists were merged (union), and the estimates for each gene from each comparison were extracted. Distributions of the estimates split by sAD, TREM2, and ADAD were plotted for each cell type's nominally significant genes in each group. Heatmaps were created using the top 500 strongest estimates for each cell type (Supplementary Fig. 3). The heatmaps were visually inspected for gene modules, and the modules were run through gene enrichment analysis using enrichR ( Supplementary  Datasets 12, 27). These heatmaps were summarized to 25 genes in Fig. 2 by taking the genes with the strongest estimates from each module while maintaining the correct proportion of genes for each module.
The effects between sAD vs. controls and ADAD vs. controls were also directly compared by cell type using the results from the 'Genetic factor differential expression' analyses described above. Only the analyses run on the entire cell type population were interrogated. The results of these analyses were filtered to include only genes with nominal associations (P < 0.05) in both analyses to ensure that there could be confidence in the effect sizes for accurate comparison. The correlation coefficient was calculated using R's native cor function.

Pathway analyses
The upregulated genes identified for each cell state were used in a subsequent pathway analysis. We used the R-based application enrichR. We used gene sets to determine pathway enrichment using the "KEGG 2021 Human" or "GO Biological Process 2021" gene sets. Downregulated genes were also run in the APOE-high neuron analysis.
A heatmap was created to summarize the upregulated "GO Biological Process 2021" (GO_BP) hits for all cell states. All GO_BP term -log10 P values for each cell state were merged into a single table. The union of the ten highest GO_BP terms for each cell state were then ranked (Rank) by finding the averaged transformed P value across all cell states for each term. The terms were then loaded into rrvgo (v1.6.0), an R package that implements the Revigo 94 tool for summarizing GO_BP terms. The function calculateSimMatrix was used to calculate the relationships between the GO_BP terms with variable inputs: orgdb = "org.Hs.eg.db", ont = "BP", method = "Rel". The terms were then summarized using reduceSimMatrix and the following variables: score = "Rank", threshold = .6, orgdb = "org.Hs.eg.db". The summarized terms or "parentTerms" and their P values were then used to make the heatmap. Rows were ordered using dist function method = "euclidean", and columns were ordered using method = "p" for Pearson correlation. Additional GO_BP terms were manually removed that showed similar signatures across the cell states and implied the same biological processes (compare Fig. 3g with Supplementary Fig. 6).

Microglia expression states
We collected 12 gene sets (Aging 58 , Homeostatic 95 , Lipid-dropletaccumulating 96 , Neurodegenerative 39 , Proliferative-region-associated 97 , Injury-responsive 98 , Activated-response 99 , Interferon-response 99 , Human-Alzheimer's 40 , Disease-associated 38 , TREM2 79 , and Granulin 79 microglia) associated with different microglia functional states that had been described in the literature. Each set was split into its up and downregulated gene lists. A hypergeometric test was performed using R's native phyper function to identify which previously reported microglial transcriptional states were recapitulated in this dataset.

Neuronal APOE and MHC-I coexpression
We followed the methods outlined by ref. 100. Briefly, MAGIC 101 (v3.0.0) was used to impute gene expression, APOE expression greater than two standard deviations marked APOE-high expression, and the genes HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, and B2M were summed to represent MHC-I expression. A total of 61 samples were analyzed (controls = 9, presymptomatic = 2, sAD = 28, ADAD = 15, others = 7). The correlation between APOE and MCH-I expression was calculated using the native cor.test function in R. Genes differentially expressed between APOE-high and APOE-low neurons were identified using linear mixed models as previously described (nebula R package; model: expression~APOEHigh + Sex). These DE analyses were performed only on the IN.0 and IN.2 cell states split by AD condition. Enriched pathways in the significantly upregulated genes were identified using enrichR as previously described.

ADAD-specific microglia cluster validation
To confirm the biological existence of Mic-stress (Mic.4), a microglia expression state observed in ADAD samples, single microglia cells (n = 1551) from 5xFAD mice were collected as described by ref. 22. Protein-encoding mouse genes were converted to their human orthologs using biomaRt (v2.42.1). The data were normalized using SCTransform, regressing out numbers of genes and UMIs. Seurat's best practices workflow was followed to integrate the mouse and human microglia using the human microglia as the reference dataset. Seven clusters were assigned using FindNeighbors and FindClusters with the first eight principal components and a resolution of 0.2 as input. A total of 297 mouse cells and 1413 of the 1429 human Mic.4 nuclei were recaptured in the post-integration cluster 2. The mouse cells were then re-isolated. A human Mic.4 signature score was calculated for each mouse cell by running the 412 upregulated genes that passed multipletest corrections into Seurat's AddModuleScore. The significant pairwise differences in cluster scores were calculated using the linear mixed model: moduleScore ∼ cluster, ziformula = ∼ 0, family = 'gaussian' ð11Þ executed using glmmTMB (v1.0.1).

TREM2-enriched and rs1582763-enriched microglial cluster validation
Human snRNA-seq data from ROSMAP samples (DLPFC) 20 were used to confirm the Mic-reduced (Mic.2) and Mic-proinflammatory (Mic.3) cell states, enriched for TREM2 p.R47H, p.R62H, and p.H157Y and the rs1582763-A allele respectively. The ROSMAP data has 11 sAD, 11 TREM2 R62H, and 10 control participants with 3986 microglia. The TREM2 p.R47H samples produced by the contributors of the ROSMAP snRNAseq data were analyzed using nanostring and, therefore, not used in replication. The microglia were isolated and normalized using Seurat's SCTransform function with "return.only.var.genes" set to FALSE and regressing out "nCount_RNA" and "nFeature_RNA". The microglia were then integrated using 3000 features in SelectIntegrationFeatures, Pre-pSCTintegration, our data as a reference in FindIntegrationAnchors, and IntegrateData. To identify which nuclei fell into our original clusters, the integrated data were clustered using the first ten principal components as input for FindNeighbors and a resolution of 15 in FindClusters. This shattered the data finding 147 clusters. We assigned each of these 147 clusters an 'original' identity by isolating our cohort of nuclei from the individual clusters and identifying the most common original ID. This ID was transferred to the ROSMAP nuclei, similar to a k-nearest neighbor classifier 102 . These cluster identities were mapped to the pre-integrated normalized ROSMAP data.
To measure the accuracy of our label transfer, cell state signature scores were calculated for each ROSMAP nucleus by running the significantly upregulated genes with estimates greater than 0.25 from each discovery cell state into Seurat's AddModuleScore (Supplementary Dataset 28). This same process was run for the downregulated genes with estimates less than −0.25. The significant pairwise differences between clusters were calculated using the linear mixed model: moduleScore ∼ cluster + ð1|subjectÞ, family = 'gaussian' ð12Þ It was executed using the lmer function from the lme4 (v1.1-23) R package ( Supplementary Fig. 10). The previously described 'Differential proportion analysis' methods were then followed to verify the enrichment of TREM2 p.R62H nuclei in the ROSMAP Mic-reduced (Mic.2) cluster and rs1582763 nuclei in the ROSMAP Micproinflammatory (Mic.3) cluster.

TREM2-enriched oligodendrocyte cluster validation
The two oligodendrocyte clusters (29,478 nuclei) in the ROSMAP data were integrated with our oligodendrocytes and labeled with our original cluster identities as described above. A total of 264 clusters were identified using eight PCA dimensions at a resolution of 15. As described above, the upregulated and downregulated signature scores were calculated on the ROSMAP nuclei using the cell state DEGs identified from the discovery cohort. Pairwise significant differences in the signature score and enrichment of TREM2 p.R62H variant carriers were calculated as described above (Supplementary Fig. 11). The previously described 'Differential proportion analysis' methods were followed to verify the enrichment of TREM2 p.R62H nuclei in the ROSMAP Oligo-TFEB (Oligo.5) cluster.
The same resource files and parameters were used in the ROSMAP cohort replication, employing 7938 out of the 8561 genes used in the discovery cohort, including 894 out of the 896 TFs. Each regulon is represented by a single TF. We tested for an overall concordance in regulon-TFs between the discovery and ROSMAP datasets using the hypergeometric test implemented in the scipy.stats.hypergeom.pmf function (version 1.6.3), using M = 896 total TFs, n = 222 TFs in discovery, N = 190 TFs in ROS-MAP, k = 122 intersecting TFs. We intersected the list of regulons from the discovery and ROSMAP, and then selected those that were differentially expressed in Oligo-TFEB state (Supplementary Dataset 8). Finally, we selected those TFs whose regulons showed a significant overlap among the list of target genes, using the hypergeometric test using M = 8561 total genes used in analysis, n = the number of genes in discovery regulon, N = number of genes in ROSMAP regulon, k = the number of intersecting genes (Supplementary Dataset 19). We employed the Benjamini-Hochberg method (alpha = 0.05) to correct for multiple testing. We then kept the significant regulons that had >2 intersecting target genes. We only used the genes that were differentially expressed in Oligo-TFEB when visualizing the network.

Data visualization browser
We developed the Single Nucleus Alzheimer disease RNAseq Explorer (SNARE) to host the single nucleus expression data using the cellxgene 105 platform. It can be publicly and freely accessed at http:// web.hararilab.org/SNARE/. We include the UMAP representations for the full dataset (consists of all cell types) and the individual cell type subclustering.

Characterization of loci identified in AD risk GWAS
A list of genes identified through AD GWAS was collected. We started with 89 genes from 38 different loci that were previously prioritized 13,61,69 . We then added in all 1240 genes from all the significant loci. This list was filtered by genes detected in our snRNA-seq dataset, leaving 530 genes (77 of the prioritized genes). The significant cell state DE results were queried for the genes within our curated lists and extracted by cell type. We did not include results from direct comparisons with cell states that contained less than 5% of that cell type's nuclei. We removed gene hits with a negative estimate for DE analyses that were a cell state against all others (e.g., mic.1 vs all other mic cell states). We calculated the log2 fold-change from the estimates provided by nebula by using this equation: We selected the maximum log fold-change in our plots if a gene had multiple hits within a single cell type. For each gene in our gene list, we calculated the mean expression in each cell type.

Replication of GWAS loci characterization
SnRNA-seq data from UCI MIND's ADRC was accessed from Synapse 17 . The data was loaded into a Seurat object and normalized using SCTransform with "nCount_RNA" and "nFeature_RNA" as regression variables. Using just the 77 prioritized genes from above, we ran differential expression analysis between cell states and between each cell state and all other cell states at once within each cell type using the cluster annotations provided with the data. As before, the nebula 91 package was used for linear mixed models on the SCTransformed expression counts with cluster and sex as fixed effects and sample as a random effect. Following the processes above, we calculated the maximum log2FC and the average expression of these genes in each cell type. Genes with a p value less than 0.05 were included in the maximum log2FC calculation.
Average expression was log10 transformed and Pearson correlations were calculated for all genes and cell types at once and for each cell type individually using cor.test in R (Supplementary Fig. 14). Maximum log2FC values were converted to binary format (0 = gene not differential expressed between cell states, 1 = differentially expressed) for comparison creating 2 × 2 tables that could be passed to fisher exact tests (fisher.test in R).

snATAC-seq data processing
We used CellRanger (version 6.1.1) to process the publicly available snATAC-seq fastq files on Synapse from the UCI MIND ADRC 17 . Reads were mapped to the GRCh38 reference obtained from https://cf. 10xgenomics.com/supp/cell-arc/refdata-cellranger-arc-GRCh38-2020-A-2.0.0.tar.gz. We first filtered the BAM file outputted by Cell Ranger using samtools view with flags -f 3 -F 4 -F 8 -F 256 -F 1024 -F 2048 -q 30. We then used a custom python script to subset the filtered BAM file into one pseudo-bulk BAM file for each cell type using the original cluster assignments provided by the authors. We used MACS 106 (version 2.2.7.1) with flags-nomodel --shift -100 --extsize 200 --keep-dup all --call-summits -B to call narrow peaks on each cell type BAM file. We removed any narrow peaks overlapping blacklisted regions in the genome (https://www.encodeproject.org/files/ENCFF356LFX). For each cell type, we generated a matrix of snATAC-seq read counts per barcode for each narrow peak, which was used as input to CICERO 107 (version 1.4.4) to calculate co-accessibility across regions.

snATAC-seq co-accessibility with AD GWAS
We downloaded the genetic fine-mapping results from Supplementary Dataset 8 61 . We used column finemap_prob_nc to obtain the posterior probability of association (PPA) values for the primary (and secondary, when available) credible sets per locus. Using only variants with PPA >0.01, we identified all snATAC-seq peaks co-accessible with these variants in a cell-type-specific manner with an absolute CICERO 107 score ≥0.001. In addition, we identified all transcription start sites (TSS) that overlapped a fine-mapped GWAS variant and a snATAC-seq narrow peak. We used pyGenomeTracks 108 (version 3.7) to visualize the results.

Statistics and reproducibility
All statistical methods and tests used in this paper are described as appropriate in the figure legends, methods, supplementary or main text. All instances where data were excluded from the further analysis are detailed above in the quality control descriptions. No statistical method was used to determine the sample size, but the frequency of the genetic variants in the general population was considered when selecting the sample size. Additionally, power analyses were performed on the anticipated sample size and nuclei counts. We planned to sequence 10,000 nuclei for each of the 74 samples for a total of~750,000 nuclei. We expected to remove 50% of these during QC, leaving~375,000. We expected the largest cell type cluster to make up 60% of these, providing the power (0.95) to detect cluster proportion differences with an effect size of Cohen's f 2 = 2.16 × 10 −4 according to the R package pwr (version 1.3.0). Predicting that the largest subcluster within this cell type would account for 50% of the nuclei, we calculated Cohen's d = 0.03 for the largest of all our analyses. We anticipated the smallest models to include~1000 nuclei, with the smallest subcluster accounting for 20% of these nuclei. This provided power (0.8) to detect f 2 = 0.03 and d = 0.44. Samples were classified into experimental groups on the basis of neuropathological analysis and clinical data. Analyses were controlled for individual-level covariates, including age and sex. Laboratory staff were blinded to sample status during sample preparation. Investigators were not blinded to group allocation during data collection and/or analysis. Knowledge of group allocation was required to perform differential abundance analysis.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The single nucleus data from the Knight ADRC generated in this study have been deposited in the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) with accession number NG00108. The raw single nucleus data from the DIAN brain bank generated in this study are available under restricted access to maintain individual and family confidentiality. These samples contain rare disease-causing variants that could be used to identify the participating individuals and families. Access can be obtained by request through the online resource request system on the DIAN Website: https://dian.wustl.edu/our-research/for-investigators/dian-observatio nal-study-investigator-resources/. As detailed on the website, "Data requests will be reviewed based on the following criteria: (1) Scientific merit and feasibility (e.g., availability of DIAN resources to fulfill the request), (2) appropriateness of the investigator's qualifications and resources to protect the data, (3) appropriateness to DIAN goals/ themes." Additional conditions of access include: "(1) the recipient to cite/reference the grant (Dominantly Inherited Alzheimer Network, U19AG032438) in any presentation or publication that may result from the research, (2) Should publications result from the use of DIAN resources now or in the future, the recipient agrees to notify the DIAN Executive Director with details (reference or PubMedCentral ID#) and provide a copy of the publication so productivity derived from [DIAN] resources can be reported to the funding agency (the National Institute on Aging (NIA)). Such publications require compliance with NIH public access policies and DIAN data sharing/publication policies, (3) Should funding result from this research now or in the future, please notify the DIAN Executive Director with details (grant title, sponsor, number, dollar total, dates) so productivity derived from [DIAN] resources can be reported to NIA, (4) no sharing of data with a third party is allowed without the permission of the DIAN Steering Committee, (5) de-identified DIAN data will be made available to investigators to conduct analyses after approval by the PI and the relevant DIAN Core Leader. Allow 30-60 days for the review process and 30 days for interaction with the Biostatistics Core to provide the dataset." The processed single nucleus RNAseq data generated in this study can be freely viewed at http://web.hararilab.org/SNARE/. The 5xFAD mouse microglia data used in this study are in the Gene Expression Omnibus (GEO database) under the accession number GSE141917. The ROSMAP single nucleus RNA-sequencing data used in this study are available at Synapse under Synapse ID syn21125841. The single-nucleus RNA-sequencing and single-nucleus ATAC sequencing data used in this study from the UCI MIND ADRC are available at Synapse under Synapse ID syn22079621.